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TDiSliTiriSRS *Harris Jacobson aeadabilitj^ P<|r«ula . 
ABSTfilCT 

T^is paper so««arites the work to d^ite on the revised 
aarris-J&cobson Readability Porsslss. , The contents include ••Tbe 
Criterion, which discuj^ses the criterion ased in the developaent 
the foriiylas: ••Variables Es ployed «« which includes percent of" 
^ncoK«on words, average sentence length, pejrcent of long nosrsis, wean 
Duiiher of letters per «ord, and spelling > patter nii; "falidity of the 
Xndivida&l Variables," which presei^ts the correlations obtained with 
basal readet levels; ••Headability Iforaol&s Based ^on Salt iple 
Correlations, ** which discnjsses two- variable and three* variable 
coabinations which h^ve been tried out and. reports on the five best 
foranlas; "Corrected Gtade Bgaival^nts," which discosses the 
corrections etsed for the Harris- Jacobson fomala; ^Practical 
Otility,* which looks at' the practical fe&tnres of the readability 
f omnia; and ••Further Research Meeded^** which discnsses a planned 
testing prograi and improving the readability for«iila. C»B> 
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REVISED HAilRIS-JACOBSON READABILITY F055IULAS 
. (Collece Re^dlngf Association^ Bethesda^ Maryland,. Oct. 31^ 197*^) 

A pralinlnary report on tho^H^rcls-J^cbbson Roadabtlity Formulvis was ^ ; 
given in Hay^ 1973» ^t a convention of the Intarnational Reading Associatiort* 
Sinco' th^t Voport much additional work has been completed, and^now, fonnuXa& 
h^\c b^Gn develop«d^^ . Tho present papor suiif^aarises the work to, date • ^ 

Tht3 Criterion * % ' . 

Readability has been defined for our wrk as those pharact«ristic» 
of reaainc; nsaterial that make" it easy or difficult to comprehends Thd criterion 
u$ed in developing the fomulas is based on the average characteristics of 
:cix popular eerie s of basal readers.^ There scries bad7b^n used in the 
dev«lopw«nt of the Hsrri^-JacobMn Basic BXtj^entfry Reading Vbcsb^leri»s (1972). 
Qatapu^er processing made it pos&ltle to Use a Ifrge mnfcer of sanples^ - 
TvQtk -primer level 'up^ tan about equally spaced samples were chosen fro» 
.eacJrbooK; e»cii $a»?lc, ha\^r.s silightly «:;r6 than 200 -^^ords* At' prepriior ^ ' 
le^el MiS'iTiany 200-wrd sartples vere taken as th^ tj;iree preprimers of a serio^ 
could provide, Thero vere 66 1 copies, totalline about 135,000 wrds* The 
swples at a Given lev^ wero all giv6n the came readiiig sc^de. value and 
piVv^M a scaio.renj^inc from i.2 to 6.5. Since the samples' at e reader, 
leval are not. all *actua31y equ^l li^dlf ficulty^ there is seine inaccuracy 
in thii criterion. In. tha criterion ^caX^ there lire seven steps at primiwry 
levala but only three st^p^ forr grades U-^^ ' ^ ^ 

Varinbll^fy Snrloyed * , ' - » 

Variable 1. {VlJ FoK^cont'of uncptngnon words . Three VA^rd li^tj? based on the 
H^rris-vlacobson word* list's were tried out. Cue li,st, which contained orilj" 
the '335 first grade wrds and their Inflected f^rmst was di^scarded when it 
was fotind to lack dispriminative ability above first grade. The Short ' , 

R&adability List contains all first-grade and second-grada Co^e and Addi- 
lion^i wSrds an^ their corr^on -^ntinctct frirr*:. it include*? QiZ root words 
and 1,BS0 Inflectod- foms, tot^llinf; £,792 \Nords. The first variable, VI 
?rFt^ui foraulus, is the per cent of uiiiqua words not in .this Short List. 
Unique ifieans that a w^ird not in the list is count^d^ oixiy once per sanple-^ 
regardless of how many tines it may occur in that sample. Unique words was 
i:uund to have slightly butter predictive ability than total nuaber of 
uncomnon words# 

* 

A third list» containing all Xirsi* second, mxd Xhxf^ Rrade words, inoluusd 
1,925, rcot wrds suid ^,0?6 inflaoted Tome, totaliin|f 6;001 wot-ds. • Tiiis was • 
fouQd'tc bo not quit# an djscrinlnAtiva at prinary lavals as tha Short I^sL. 
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•For the middle grades th^ tvo leird list* were about equal, and there wore 
i^strong indications that for forsnulas intended to work above, sixth grade levol, 

the Long List would be better. Only the Short List is usod in the fonaulas 

described in this paper. • . 

Variable 2 (V2), Averar^e Sentence L<»np:th . The second variable used in the 
fomulas is obtained by dividincr the total number words in a sample » 
..by the number of sentences, providing the maan number of* words por^ sentence. 
This variable is also used in the Lprge, Dale and Chall, and Spache Formulas; 

Variable 3 (V3). Pe r Cont of Ix>ny^ Iferrfs . Several. wavs of measuring word 
ifficulty directly wcro tried out. These included counting the number of 
words that contain more than five letters and dividing by the number of 
words. A word with six or more letters is considered a long word. This 
variable is quickly and easily scored by band as wall as by computer. 

Variable U (\'k) , ^fean Kumbor of Letters per Vtord . Another way- to measure 
word difficulty is based oh the assvjjnption that the longer' a, word*, the harder 
it is likely to be. Average number o!P- letters per word can be scored almost 
instantar.oously by computer, but isvslow and laboflous to score by hand. 
Since it turned out to be slightly inferior io V3 in predictive' power, it 
does not appoar in our formulas. 

Var-^api-'^ (V^)« boet iiritr kattftrr.q. ]n our preUwinwry wo'rk we discdverod 
that the par cent of wotxis beginning with the letter e has a substantial 
correlation with the difficulty of primary reading materials. Dr. Jacobson 
located over 1,U00 spoiling rules about spellings ^^eh occur characteristi- 
cally at tho bcpinninp:, end, or in the niddlo of words, mainl/ f k»u Hanna . 
S^/^'.i}^^^) ' These rMes were combined to fonn 101 spelling- patterns. 
Dr. Jabbbsdn developed computer Diagrams for identifying and counting these 
patterns i» a sample of reading material, and* correlated aH 101 patterns' 
.with the cHterion of reader level. In a recently published paper (»iacobson, 
19?^) hs reported that a combination of 37 spoiling "patterns correlated .92 '^^ 
with primary readihg difficulty, 

.V • . ' / 

Since then, .Jacobson has located 12 spelling patterns whicfi, wherj combined 
by multiple^ correlation, provide amazingly high correlations with reading 
level. The; 12 best patterns at the primary level have only one pattern* in 
common with the 12 best patterns for ^rade&J-S, The per cent of words ending" 
with' a sin^^le letter 1 increases across the full range from preprlner through 
sixth -reader. Most of the other patterns are effective at primary or middle- 
grade level but not -at both. There, will be further discussion of this variable 
later in this paper^ . • ' 

Validity of the Individual Variables 
» ■ 

The first-order Pearson r»s with basal reader levels are shown in Table 1, 
for grades 1-3, and for F:i*ad®s 1-6. Correlations were also obtainiad for 
grades ^-6 and grades 3-6, Since there was only one reading level, per grade 
in grades ^-6, the criterion scale provided only three steps far those grades 
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and only five steps with third grade added; the coirclatious with^ those 
very coarse criterion sralos wern* low to moderate. 

It may bq noted that VI surpasses V2 for grades I--6, hut V2 has the higher 
correlation- for grades 1-3. V5, which has the highest correlations at both 
levels » is at present scorable only by computer. Of the three measures of 
word difficulty that are scorable by hand» VI arid V3 have - approTcimately eqwol 
x*s for grades 1-^3 and VI is clearly the best for grades I-6» Variable 3 is " 
consistently superior to V4. t - . . 

Tlie relationships of Variables 1, 2, and 3 to reader levels are shpwn in 
Figures 1,2, and 3^ while that for variable 5 is shown in figures 4 (primary), 
5 (elementary) and 6' (grades 1-6). At-each reader levels tihe steeper the slope 
of the line and the smaller the standard deviation^ the better the discriniinat?ive 
power of the variable » Variable 1 (Fig. 1) starts off poorly at /irst grade 
.level but does well over the rest of the range. Variable 2 <€ig. 2) shows a 
fairly consistent upward slope except for poor discrimination between hjgh 
third level and fourth reader level. Variable 3 shows steady upw^d progress- 
ion but comparatively large standard deviations. Variable 5 shows steady 
upward progression for the primary and elementary grades with lesser disc;rliD'- 
ination iJetween the high third level and four^;h level readers. 

Readability Formulas Based on Multiple Correlations * 

Thu piuruiwLlvc po;;t.i. of a eui^biuation of vatiaLlei; depends nut c.ily on /ilia 
correlation of each variably with the criterion^ but also on the size 6f the 
correlations between the variables; the lower the correlatl^ons among variables , 
the greater the benefit obtained from combining' them. A large nuuiber of • 
twd-variable and threes-variable combinations have been tried out, and the 
*tivTe best formulas are reported here. The five formulas in regressipn equation 
form are as foil ows : 



Foiwula 1. Readability level = ,09^ Vi + .168 V2 + .502 

Formula 2. Readability level = , 1^0 Vi + , i>3 V2 + ,560 

Fomula 3, Readability level^= .I58 V2 + .055 V3 + .355 

Formula Readability level ,07t) V1+ .125 V2 + .03? V3 + .W 

Fpmula 5. Readability level = . liS VI + . 13^ V2 + .032 V3 + A?M 

The combination of VI and V2 pro%ddes efficient readability formulas both 
for priwary-srade natorial (Fomula 1) and middle-'grade material (Formula 2). 
These formulas employ the same variables but give them somewhat different 
weights, iOita on the validity and reliability of the five formulas are given 
in Table 2. Formula^- 1, and.^, iJXA T^Ji".. w^f?, ,wit^ to be 

below fourth reader level. Formulas 2 and 5 cover the range i'rbra* preprimer 
through six but are recommended for use only when the material is thought to"v, 
be above third grade in difficulty. 

The multiple correlation coefficients of the five formulas are shown in 
the first column of Table 2. The two throo-variablo formulas aro slig^htly 
higher in the value of R than the thro^ tw-variable formulas. All of the 
R's are quit© similar, ranging only from to .918. , 
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The per cent of th© 16^-?! vuriai;jc<* aocounttKi for by £ raultiple correlation 
is indicatod bj R% wh^'^ch is^ shown in tho second column* of Table 2. Of the 
three primkry-levcl ^<miul'Ss, Formula > is 3.6 per cant better than Formula 1, 
and Formula 1 is 1.9 por cent better than Formula 3. Tho standard errors iJi;^ 
estiraato, in column 3, are in the sfiiae order, vdth Fomula 4' having the smallest 
error, Formula 1 next, and Formula 3 the largest of the three. 

Which of these fomulas to use for hand computation depends on one's priori ties* 
When maxirauiu validity is more important than speed, the threo-variV)le Formula U 
is the obvious choice. . V^en spaed of scoring and computation is inost iaportant. 
Formula 3 may te chosen. For all-around efficiency combining good validity 
with next-best speed and esse of uso, Fqrhula 1 should be preferred. 

Formula 2 and Formula 5 are intended f©r uso when tho readability of the 
material is probably aboVe third grade. In terrat of ft^^ Formula 5 is onlj-- 
.9 per cent better than Formula 2, and the stani rd errors are very similar. 
Formula 2 uses -two variables and Formula" 5 requ..«i throe variablos. The 
substantial additional tirao for s«>ring and compii f ZXon that Formula 5 requires 
does not seem worth the very slight Rain in val*.(M 7. For usost hand computations 
Formula- 2 is recommended for ralddld-grado waterial. ' >^ 
' - ■ *. ' .-^■'"'^'^ 

For computerised computation and scoring the three-variable fomulas are 
preferable the two-varl^iblA formnls';. Adding & fourth varl«bl^ not 

prcvidd ^uy further Laprovement iii'^yalidity, " 

IS ' * 

Th» relia^pilities of the formulas are "also -shown in Ta&le 2^ They are all 
.92 or better, indicating very satisfactory reliability. We rdcomond taking 
five samples from a book, or three samples from .a ^hort selection. Tho 
avoraj^e readability score obtained in either case should bo very reliable. 
Those reliabilities were obtained by separating the samples into random halves, 
getting the" correlations between the two sets of formula scores, apd applying 
the Speaman-Browi Fomula. ' 

Variable 5, the Spelling Patterns Variable, is not included in Tabl» 2; its 
correlations with the criterion are shown on' the bottom lino of Table 1. 
Th* 12 best sp«?ilin5 patterns for gz^td^s 1-3 correlate .93 with Ihe criterion, 
higjier than any of the correlations in Table 2. The 12 best patterns, for 
grades 1-6 correlate .913 with the criterion, equal to Formula h and better 
than the other four formulas. Scoring and computing this variable by hand 
would be prohibitively timan'corisuning.. We plan tQ-try-.to! t s i mpHfy -t his v a yj a bj e 
further. If it can be reduced to four ^r five components^ ha».d scoring may 
bocono feasible, although it will still be quite laborious. ^ Meanwhile use of 
this variable requires emplojmient of the special computer programs developed 
by Jacobson. Adding any two of jthe otlter four variables does not further improve 
the correlation of spelling patterns with basal reader levels. 

The reliabilities of the five fonftulas are shown in the right-hand column 
qf Table 2,' and range, from .916 to .9^7. Since the vsOidity coefficients are 
almost as high' as the reliabilities, it would be necessary to raise the reliabili- 
ties still higher to achieve further increases in validity. 
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We hav« also tri«id correlating uur variables vrlth tha 50*T»r cont and 70 
per cent coriprohension scores of the McCall^ Crabbs Standard Test Lessons 
in ileadinp' , the criterion us^kJ by Dale and Chall. The co rreXa. tiojr)s w6ro 
slightly b\lt consistently iiigher with the 50 per cent critorion» A COTabination 
of four variables including Spellinj^; Patterns gives a multiple correlation 
with HcCall, Crabbs of •7^t which is fi^Jout ^> per cent bettor than the 
* correlation reported by Dalp gnd Chall for theix fomula^ Howevor, this 
readability formula requires the use -of special computer progrcnns* 

It is our impression, af^er working with the KqCall, Crabbs exercises, that 
the p.radc scores for many^ of . thijni ar« ijjaccuratoj the* oxarcises should bo 
ro-standardizod to pirovide a better readability criterion. 

CorroctQd _GradG Equivalents , \Jhonevar pmdicted scores are obtained froa: a 
regression equation^ the predicted scores are less variable than the 
criterion scores* The low scores are not as low^ and the high scores are 
not as hif^h. Dale and Chall found it necessary to provide a cori^ction 
tabic for their formula scores* For axamplo^ a Dale*Chall obtained sooro 
betv?eon 7*0 and 7*9 is* interpreted as indicating ninth to tenth grade 
readinfj difficulty* - . • 

€ 

Vfe have found it necessary to prfividie corrections also^ ^althou^ our 
cofrpot^ons not l;»rgf^* Tb^ corrections for the fiva H-*J formul^is ^ 

i;}iOU;i in Table 3» use this table^'ft'lrst l^ocatt:? tlie colui*ii foV 
the formula used* Next, locate tho^ score interval into which the 
obtained fomula scored falls* Finally, read horizontally to the left 
to find the readability level corrospondin?; to ti^t score interval* 
For examploy a' Formula 5 ijcore of ^.10 falls in tha interval 3*77 to 4»23> 
which. corresponds to high third readability level*. 

Practical jjtlllty . % 

We have conf i^nned the well7established finding that the most- valid sirtgle 
indicator of readability is a good measure of vocabulary difficulty. Our 
best iScasure of vocabulary difficulty is Spelling Patterns^ but that variable 
is at present scorable only with the use of special cbinputcx puogramii. Of 
the other treasures of vocabulary ve have tried/ the per cent pf words not 
foimd in the H-J Short List is the most val^^^with per cent of long words 
only slightly^behind* 

Since our formulas provide scores which are not greatly different in validity 
froiB tho&e obtained with the Spache and Dale-Chall ^orniulas'^, the question of 
practical utility becomes important* TI)e H-J Short List has slightly fever 
root words than the revised Spache List* In checking whether a' word is 
familiar of unfamiliar the Vl score is substantially faster and easier to 
obtain thcfn the corresjJonding Spache score , because a w*»rd either 'is or is 
not in the list> while ^t he Spache^List requires the application of 11 rules. 
V3> the per cent of long words > is even faster to< obtain and entails only 
a slight loss in validityr Similarly, the Short List is only 30 to 35 
per cent as lon^ as the Dale List, which requires the appTication of 22 rules. 
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' The fomulas prosontoi hero^ liko tho other readability formulas in comrton 
us©i aro based on charactorii>ticD of samples of c^^^^^d reading material / 
It is aiisumod that the ability of cliildron to undorr»tand the matorial is 
dirBctly and closely related to thesn characteristics* This ussuTipfeion needs 
to bo tt<:>ted. 

We aro plannin;;: a toctini^ pront^arri in which sample McCill|» Crabbs exorcises 
will bo re- standardised,^ and the averar^e comprehension scores of children 
on theti vill pro\rldo another and ^>arhaps better criterion for validating 
or innrovinc our readability formulas* - . 

Full directions for usinr: the Harris-JacoSson Readability* Formulas 1 and 
2^ includinf: a copy of the Short List used in Vl> arc given in a beoU to 
bo published in February^ 1975 (Harris and Sipay, 197t>h Those vho may 
be intere:;ti;>d in using any of tUo K-J conpute'risied readability formulas 
are invited to got dn touch vdth Dr, Jacobson. 

Dale, Kd.^ar, and Chall, Jeanna b'. A formula 'for predicting roadabillty* 

. . r/^Mo^^f ^•^^l'^ ''- ^^^^-^ H n.n 1 ek^ n (Oh^o .<?t»+o University^ Jan 7^1 -n'^ 
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Table K Corrolatiow coefficients of readability v.-iriablec 
with basal reader levels 



V^triable Grades Grades 
' 1-3 1-6 



Vi^ > of uniouo words in 

H-J :;hort Utit ^ ,797 ,863 

Y2« mean manbor of wards por\^ . 

sontence ^^831 ^79^ 

73, ^ of words having nsore than 

5 letters .gl^^ .795 

V^. Fiean number of letters per word .736 ,739 

V5. spelline patterns »930 " .915 



TAble 2. Validity and reliability oF five Karri s-Jacobson 



Readability Formulas 



^ VoUdity Reliability 
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SEost 


• (Gpcarrean-Drassn) • 


FomuJ.a 1 


.898 


'.807 


.334 


.93^^ 


Formula 2* 


^.90^ 


-817 


.71^ 


.9^ 




.888 


•788 




.916 


Formula ^ 


.918 

•.962^ 


.8^3' 




.9^1 


Formula 5 


.826 


.698 


-9^7 
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Tsblo 3. Corrwctmi grade oquiv^aonts for predicted scores 
on five H-J Ro^dfibillty fomulas 



Readability Leva J 

7 



Frddictisd Scoro 



I 



Famula 1^ i»onmia 3 Fomoia 4 



FreprimoT (1.0 - X.y*) 


1.0 . 


. 1.53 .X'O - i.4S 


1.0 - 1.49 


PrlTxr (i.35 - UO:) 


1.54 


« 1.74 1.49 


- 1.80 


1.50 


- 1.7^ 


nrct rosd«r (I.65 - 1.99) 


1.75 


- 1.98. .1.81 


- 2.15 


1.75 


- 2.04 


Low secorKl (2.^0 - 2,49) 


1.99 


- 2.37 2.16 


- 2,57 


2.05 


- 2.47 


High &i)OQnd (2.50 - 2.99) 


2.3& 


- 2.84 2.53 


- 2.90 


2.4a 


- 2.89 


Low third (3.00 - 3.^f9) 


2.85 


- 3.30 2.91 


-•3. 16 


2.90 


- 3.30 


Hl-h ♦M'-i (3.50 - 3.09} 


3.31 


- 3.?^ J^^9 




'^^ 




Fourth and up (4.00 +) 


3.75 and up 3.41 «nd up > 


3.75 up 


* 

Roadability Lavel 




Predloted 


Sooro 
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FcTOula 5 
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Preppi^Tjor (1.0 - l.:^0 


1.0 . 


. 1.63 


1.0 . 


• 1-57 


prljnor (1.35 -.1.6^ ) 


K6^ 


- 1.83 


1.58 


- 1.80 


Firat reader (i.6,> -1.99) 


1.84 


- 2. 07 


1.81 


« 08 


Low sdoond (2.00 - 2.49) 


2.08 


- 2.42 


2.09 


- 2.50 


High cecond (2.50 - 2.99) 
*• 


2.43 


- 2.98 


2.51 


- 3.07 


Low third (3.OO - 3.49) 


2.99 


- 3.70 


• 3.08 


- 3.76 


High third (3.50 \ 3.99) 


• 3.71 


- 4.21 


3.77 


- 4.23 


Fourth (4.00 . 4,99) 


^.22 


- 4.80 




- 4.81 


Fifth (5.00 - 5.99) 


4.81 


5*28 


4.24 


- 5.30 


Si*th (6.00 - 6.99) 

• 


5.29 


- 5.67 


5.31 


- 5.73 


Da/enth (7.00 - 7.99) 


5.68 


- 6.05 


5e74 


- 6.08 


Eaghtb 9M up (8.00 +) 


f>,0<> 


ard Txp 


6.0V 


SifXKi up 



r 









22 
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4J 
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16 
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14 
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Fig. 1. ' Variable 1 
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Basal Header Levels 

F^^^. 2, variable 2. Means and standard deviations for mean nuir.ber 
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